Overview

Dataset statistics

Number of variables15
Number of observations4300713
Missing cells21638956
Missing cells (%)33.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory492.2 MiB
Average record size in memory120.0 B

Variable types

Categorical8
Numeric7

Warnings

country_region_code has a high cardinality: 134 distinct values High cardinality
country_region has a high cardinality: 135 distinct values High cardinality
sub_region_1 has a high cardinality: 1860 distinct values High cardinality
sub_region_2 has a high cardinality: 9915 distinct values High cardinality
metro_area has a high cardinality: 65 distinct values High cardinality
iso_3166_2_code has a high cardinality: 2224 distinct values High cardinality
place_id has a high cardinality: 13276 distinct values High cardinality
date has a high cardinality: 368 distinct values High cardinality
sub_region_1 has 73227 (1.7%) missing values Missing
sub_region_2 has 715894 (16.6%) missing values Missing
metro_area has 4276928 (99.4%) missing values Missing
iso_3166_2_code has 3531512 (82.1%) missing values Missing
census_fips_code has 3384928 (78.7%) missing values Missing
retail_and_recreation has 1598804 (37.2%) missing values Missing
grocery_and_pharmacy has 1694251 (39.4%) missing values Missing
parks has 2235583 (52.0%) missing values Missing
transit_stations has 2127730 (49.5%) missing values Missing
workplaces has 198009 (4.6%) missing values Missing
residential has 1798617 (41.8%) missing values Missing
metro_area is uniformly distributed Uniform
grocery_and_pharmacy has 69501 (1.6%) zeros Zeros
workplaces has 67235 (1.6%) zeros Zeros
residential has 91313 (2.1%) zeros Zeros

Reproduction

Analysis started2021-03-02 09:16:42.551556
Analysis finished2021-03-02 09:23:13.803285
Duration6 minutes and 31.25 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

country_region_code
Categorical

HIGH CARDINALITY

Distinct134
Distinct (%)< 0.1%
Missing2737
Missing (%)0.1%
Memory size32.8 MiB
US
934553 
BR
692859 
IN
244331 
TR
 
194737
GB
 
153032
Other values (129)
2078464 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters8595952
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAE
2nd rowAE
3rd rowAE
4th rowAE
5th rowAE
ValueCountFrequency (%)
US934553
21.7%
BR692859
16.1%
IN244331
 
5.7%
TR194737
 
4.5%
GB153032
 
3.6%
AR150987
 
3.5%
PL139431
 
3.2%
NL127361
 
3.0%
CO116651
 
2.7%
AU98814
 
2.3%
Other values (124)1445220
33.6%
2021-03-02T09:23:14.223161image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
us934553
21.7%
br692859
16.1%
in244331
 
5.7%
tr194737
 
4.5%
gb153032
 
3.6%
ar150987
 
3.5%
pl139431
 
3.2%
nl127361
 
3.0%
co116651
 
2.7%
au98814
 
2.3%
Other values (124)1445220
33.6%

Most occurring characters

ValueCountFrequency (%)
R1215135
14.1%
S1128559
13.1%
U1066841
12.4%
B938709
10.9%
N553100
 
6.4%
T429985
 
5.0%
I400834
 
4.7%
A396127
 
4.6%
L348550
 
4.1%
C336102
 
3.9%
Other values (16)1782010
20.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8595952
100.0%

Most frequent character per category

ValueCountFrequency (%)
R1215135
14.1%
S1128559
13.1%
U1066841
12.4%
B938709
10.9%
N553100
 
6.4%
T429985
 
5.0%
I400834
 
4.7%
A396127
 
4.6%
L348550
 
4.1%
C336102
 
3.9%
Other values (16)1782010
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin8595952
100.0%

Most frequent character per script

ValueCountFrequency (%)
R1215135
14.1%
S1128559
13.1%
U1066841
12.4%
B938709
10.9%
N553100
 
6.4%
T429985
 
5.0%
I400834
 
4.7%
A396127
 
4.6%
L348550
 
4.1%
C336102
 
3.9%
Other values (16)1782010
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8595952
100.0%

Most frequent character per block

ValueCountFrequency (%)
R1215135
14.1%
S1128559
13.1%
U1066841
12.4%
B938709
10.9%
N553100
 
6.4%
T429985
 
5.0%
I400834
 
4.7%
A396127
 
4.6%
L348550
 
4.1%
C336102
 
3.9%
Other values (16)1782010
20.7%

country_region
Categorical

HIGH CARDINALITY

Distinct135
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.8 MiB
United States
934553 
Brazil
692859 
India
244331 
Turkey
 
194737
United Kingdom
 
153032
Other values (130)
2081201 

Length

Max length22
Median length7
Mean length8.50622048
Min length4

Characters and Unicode

Total characters36582813
Distinct characters57
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnited Arab Emirates
2nd rowUnited Arab Emirates
3rd rowUnited Arab Emirates
4th rowUnited Arab Emirates
5th rowUnited Arab Emirates
ValueCountFrequency (%)
United States934553
21.7%
Brazil692859
16.1%
India244331
 
5.7%
Turkey194737
 
4.5%
United Kingdom153032
 
3.6%
Argentina150987
 
3.5%
Poland139431
 
3.2%
Netherlands127361
 
3.0%
Colombia116651
 
2.7%
Australia98814
 
2.3%
Other values (125)1447957
33.7%
2021-03-02T09:23:14.803611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united1090522
20.0%
states934553
17.1%
brazil692859
12.7%
india244331
 
4.5%
turkey194737
 
3.6%
kingdom153032
 
2.8%
argentina150987
 
2.8%
poland139431
 
2.6%
netherlands127361
 
2.3%
colombia116651
 
2.1%
Other values (147)1621151
29.7%

Most occurring characters

ValueCountFrequency (%)
a4459300
12.2%
t3678259
 
10.1%
i3427324
 
9.4%
e3369243
 
9.2%
n2846412
 
7.8%
d2110345
 
5.8%
r1876235
 
5.1%
l1597440
 
4.4%
s1298608
 
3.5%
1164902
 
3.2%
Other values (47)10754745
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter29947345
81.9%
Uppercase Letter5463724
 
14.9%
Space Separator1164902
 
3.2%
Other Punctuation4291
 
< 0.1%
Open Punctuation1104
 
< 0.1%
Close Punctuation1104
 
< 0.1%
Dash Punctuation343
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a4459300
14.9%
t3678259
12.3%
i3427324
11.4%
e3369243
11.3%
n2846412
9.5%
d2110345
7.0%
r1876235
6.3%
l1597440
 
5.3%
s1298608
 
4.3%
o1066760
 
3.6%
Other values (18)4217419
14.1%
ValueCountFrequency (%)
S1141549
20.9%
U1105753
20.2%
B774886
14.2%
I325744
 
6.0%
P301515
 
5.5%
A300808
 
5.5%
C294625
 
5.4%
N264308
 
4.8%
T218315
 
4.0%
K174525
 
3.2%
Other values (14)561696
10.3%
ValueCountFrequency (%)
1164902
100.0%
ValueCountFrequency (%)
'4291
100.0%
ValueCountFrequency (%)
-343
100.0%
ValueCountFrequency (%)
(1104
100.0%
ValueCountFrequency (%)
)1104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin35411069
96.8%
Common1171744
 
3.2%

Most frequent character per script

ValueCountFrequency (%)
a4459300
12.6%
t3678259
 
10.4%
i3427324
 
9.7%
e3369243
 
9.5%
n2846412
 
8.0%
d2110345
 
6.0%
r1876235
 
5.3%
l1597440
 
4.5%
s1298608
 
3.7%
S1141549
 
3.2%
Other values (42)9606354
27.1%
ValueCountFrequency (%)
1164902
99.4%
'4291
 
0.4%
(1104
 
0.1%
)1104
 
0.1%
-343
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII36577668
> 99.9%
None5145
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a4459300
12.2%
t3678259
 
10.1%
i3427324
 
9.4%
e3369243
 
9.2%
n2846412
 
7.8%
d2110345
 
5.8%
r1876235
 
5.1%
l1597440
 
4.4%
s1298608
 
3.6%
1164902
 
3.2%
Other values (45)10749600
29.4%
ValueCountFrequency (%)
ô4291
83.4%
é854
 
16.6%

sub_region_1
Categorical

HIGH CARDINALITY
MISSING

Distinct1860
Distinct (%)< 0.1%
Missing73227
Missing (%)1.7%
Memory size32.8 MiB
State of São Paulo
 
129391
State of Minas Gerais
 
89699
Texas
 
68947
State of Rio Grande do Sul
 
53386
State of Paraná
 
52646
Other values (1855)
3833417 

Length

Max length74
Median length12
Mean length12.55948807
Min length3

Characters and Unicode

Total characters53095060
Distinct characters122
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowAbu Dhabi
2nd rowAbu Dhabi
3rd rowAbu Dhabi
4th rowAbu Dhabi
5th rowAbu Dhabi
ValueCountFrequency (%)
State of São Paulo129391
 
3.0%
State of Minas Gerais89699
 
2.1%
Texas68947
 
1.6%
State of Rio Grande do Sul53386
 
1.2%
State of Paraná52646
 
1.2%
Georgia48680
 
1.1%
State of Bahia48475
 
1.1%
Buenos Aires Province45072
 
1.0%
State of Santa Catarina43505
 
1.0%
Virginia43463
 
1.0%
Other values (1850)3604222
83.8%
(Missing)73227
 
1.7%
2021-03-02T09:23:15.567495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of719153
 
8.9%
state697350
 
8.6%
county222079
 
2.7%
province168491
 
2.1%
voivodeship139063
 
1.7%
region138620
 
1.7%
são129735
 
1.6%
paulo129391
 
1.6%
south103530
 
1.3%
north102582
 
1.3%
Other values (1969)5567248
68.6%

Most occurring characters

ValueCountFrequency (%)
a6209722
 
11.7%
o4250463
 
8.0%
e3917832
 
7.4%
3889756
 
7.3%
t3576739
 
6.7%
i3528111
 
6.6%
n3391133
 
6.4%
r3024856
 
5.7%
s2209305
 
4.2%
l1495264
 
2.8%
Other values (112)17601879
33.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter41794233
78.7%
Uppercase Letter7308263
 
13.8%
Space Separator3889756
 
7.3%
Dash Punctuation93787
 
0.2%
Other Punctuation8653
 
< 0.1%
Format368
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a6209722
14.9%
o4250463
10.2%
e3917832
9.4%
t3576739
8.6%
i3528111
8.4%
n3391133
 
8.1%
r3024856
 
7.2%
s2209305
 
5.3%
l1495264
 
3.6%
u1457779
 
3.5%
Other values (65)8733029
20.9%
ValueCountFrequency (%)
S1346050
18.4%
P682467
 
9.3%
C656395
 
9.0%
M512113
 
7.0%
G439248
 
6.0%
A372370
 
5.1%
V338401
 
4.6%
R320626
 
4.4%
N320299
 
4.4%
B300446
 
4.1%
Other values (31)2019848
27.6%
ValueCountFrequency (%)
'4692
54.2%
.2860
33.1%
,1101
 
12.7%
ValueCountFrequency (%)
3889756
100.0%
ValueCountFrequency (%)
-93787
100.0%
ValueCountFrequency (%)
368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin49102496
92.5%
Common3992196
 
7.5%
Inherited368
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
a6209722
 
12.6%
o4250463
 
8.7%
e3917832
 
8.0%
t3576739
 
7.3%
i3528111
 
7.2%
n3391133
 
6.9%
r3024856
 
6.2%
s2209305
 
4.5%
l1495264
 
3.0%
u1457779
 
3.0%
Other values (106)16041292
32.7%
ValueCountFrequency (%)
3889756
97.4%
-93787
 
2.3%
'4692
 
0.1%
.2860
 
0.1%
,1101
 
< 0.1%
ValueCountFrequency (%)
368
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII52336167
98.6%
None754139
 
1.4%
Latin Ext Additional4386
 
< 0.1%
Punctuation368
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a6209722
 
11.9%
o4250463
 
8.1%
e3917832
 
7.5%
3889756
 
7.4%
t3576739
 
6.8%
i3528111
 
6.7%
n3391133
 
6.5%
r3024856
 
5.8%
s2209305
 
4.2%
l1495264
 
2.9%
Other values (46)16842986
32.2%
ValueCountFrequency (%)
á165278
21.9%
ã144668
19.2%
í64570
 
8.6%
ı28765
 
3.8%
é28449
 
3.8%
İ24770
 
3.3%
ä23601
 
3.1%
ș21860
 
2.9%
ö20487
 
2.7%
ó17226
 
2.3%
Other values (48)214465
28.4%
ValueCountFrequency (%)
368
100.0%
ValueCountFrequency (%)
1089
24.8%
736
16.8%
736
16.8%
721
16.4%
368
 
8.4%
368
 
8.4%
368
 
8.4%

sub_region_2
Categorical

HIGH CARDINALITY
MISSING

Distinct9915
Distinct (%)0.3%
Missing715894
Missing (%)16.6%
Memory size32.8 MiB
Washington County
 
9789
Jefferson County
 
8108
Franklin County
 
7658
Jackson County
 
7003
Lincoln County
 
6685
Other values (9910)
3545576 

Length

Max length56
Median length13
Mean length13.21418571
Min length2

Characters and Unicode

Total characters47370464
Distinct characters141
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowComuna 1
2nd rowComuna 1
3rd rowComuna 1
4th rowComuna 1
5th rowComuna 1
ValueCountFrequency (%)
Washington County9789
 
0.2%
Jefferson County8108
 
0.2%
Franklin County7658
 
0.2%
Jackson County7003
 
0.2%
Lincoln County6685
 
0.2%
Madison County6409
 
0.1%
Montgomery County5941
 
0.1%
Marion County5592
 
0.1%
Monroe County5373
 
0.1%
Union County5372
 
0.1%
Other values (9905)3516889
81.8%
(Missing)715894
 
16.6%
2021-03-02T09:23:16.278593image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
county1029129
 
15.4%
municipality208758
 
3.1%
district158550
 
2.4%
of118373
 
1.8%
department88696
 
1.3%
province83612
 
1.3%
do63457
 
1.0%
de61494
 
0.9%
city59400
 
0.9%
são39911
 
0.6%
Other values (9635)4764709
71.4%

Most occurring characters

ValueCountFrequency (%)
a4289551
 
9.1%
o3646235
 
7.7%
n3542846
 
7.5%
i3188962
 
6.7%
3091558
 
6.5%
t3063376
 
6.5%
e2838905
 
6.0%
r2642997
 
5.6%
u2341440
 
4.9%
l1669866
 
3.5%
Other values (131)17054728
36.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter37669287
79.5%
Uppercase Letter6393647
 
13.5%
Space Separator3091558
 
6.5%
Dash Punctuation123940
 
0.3%
Other Punctuation52708
 
0.1%
Decimal Number39324
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a4289551
11.4%
o3646235
9.7%
n3542846
9.4%
i3188962
 
8.5%
t3063376
 
8.1%
e2838905
 
7.5%
r2642997
 
7.0%
u2341440
 
6.2%
l1669866
 
4.4%
y1589866
 
4.2%
Other values (64)8855243
23.5%
ValueCountFrequency (%)
C1560055
24.4%
M582116
 
9.1%
S490442
 
7.7%
P451553
 
7.1%
D405086
 
6.3%
B340117
 
5.3%
L244731
 
3.8%
A244650
 
3.8%
R209866
 
3.3%
G208229
 
3.3%
Other values (38)1656802
25.9%
ValueCountFrequency (%)
115150
38.5%
25111
 
13.0%
53048
 
7.8%
42707
 
6.9%
72637
 
6.7%
32585
 
6.6%
82277
 
5.8%
92229
 
5.7%
62149
 
5.5%
01431
 
3.6%
ValueCountFrequency (%)
.30454
57.8%
'10936
 
20.7%
,7920
 
15.0%
/3030
 
5.7%
&368
 
0.7%
ValueCountFrequency (%)
-123007
99.2%
592
 
0.5%
341
 
0.3%
ValueCountFrequency (%)
3091558
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin44062934
93.0%
Common3307530
 
7.0%

Most frequent character per script

ValueCountFrequency (%)
a4289551
 
9.7%
o3646235
 
8.3%
n3542846
 
8.0%
i3188962
 
7.2%
t3063376
 
7.0%
e2838905
 
6.4%
r2642997
 
6.0%
u2341440
 
5.3%
l1669866
 
3.8%
y1589866
 
3.6%
Other values (112)15248890
34.6%
ValueCountFrequency (%)
3091558
93.5%
-123007
 
3.7%
.30454
 
0.9%
115150
 
0.5%
'10936
 
0.3%
,7920
 
0.2%
25111
 
0.2%
53048
 
0.1%
/3030
 
0.1%
42707
 
0.1%
Other values (9)14609
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII46553939
98.3%
None815592
 
1.7%
Punctuation933
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a4289551
 
9.2%
o3646235
 
7.8%
n3542846
 
7.6%
i3188962
 
6.9%
3091558
 
6.6%
t3063376
 
6.6%
e2838905
 
6.1%
r2642997
 
5.7%
u2341440
 
5.0%
l1669866
 
3.6%
Other values (59)16238203
34.9%
ValueCountFrequency (%)
ã85481
 
10.5%
á83885
 
10.3%
ó75673
 
9.3%
í71941
 
8.8%
é66464
 
8.1%
ç47131
 
5.8%
ı34176
 
4.2%
ö30393
 
3.7%
ł24462
 
3.0%
ú22359
 
2.7%
Other values (60)273627
33.5%
ValueCountFrequency (%)
592
63.5%
341
36.5%

metro_area
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct65
Distinct (%)0.3%
Missing4276928
Missing (%)99.4%
Memory size32.8 MiB
Port-au-Prince Metropolitan Area
 
368
Naimey Metropolitan Area
 
368
Davao City Metropolitan Area
 
368
Eldoret Metropolitan Area
 
368
Casablanca Metropolitan Area
 
368
Other values (60)
21945 

Length

Max length34
Median length25
Mean length25.6671852
Min length21

Characters and Unicode

Total characters610494
Distinct characters48
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKabul Metropolitan Area
2nd rowKabul Metropolitan Area
3rd rowKabul Metropolitan Area
4th rowKabul Metropolitan Area
5th rowKabul Metropolitan Area
ValueCountFrequency (%)
Port-au-Prince Metropolitan Area368
 
< 0.1%
Naimey Metropolitan Area368
 
< 0.1%
Davao City Metropolitan Area368
 
< 0.1%
Eldoret Metropolitan Area368
 
< 0.1%
Casablanca Metropolitan Area368
 
< 0.1%
Sargodha Metropolitan Area368
 
< 0.1%
Chelyabinsk Metropolitan Area368
 
< 0.1%
Bamako Metropolitan Area368
 
< 0.1%
Bacolod Metropolitan Area368
 
< 0.1%
Voronezh Metropolitan Area368
 
< 0.1%
Other values (55)20105
 
0.5%
(Missing)4276928
99.4%
2021-03-02T09:23:16.818154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
area23785
32.2%
metropolitan23785
32.2%
city1104
 
1.5%
voronezh368
 
0.5%
mombasa368
 
0.5%
bacolod368
 
0.5%
nairobi368
 
0.5%
doha368
 
0.5%
yangon368
 
0.5%
pampanga368
 
0.5%
Other values (62)22681
30.7%

Most occurring characters

ValueCountFrequency (%)
a78760
12.9%
o61554
10.1%
r58565
9.6%
e56680
9.3%
t54194
8.9%
50146
 
8.2%
i34322
 
5.6%
n33353
 
5.5%
l30274
 
5.0%
M26729
 
4.4%
Other values (38)125917
20.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter484577
79.4%
Uppercase Letter74299
 
12.2%
Space Separator50146
 
8.2%
Dash Punctuation1472
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
a78760
16.3%
o61554
12.7%
r58565
12.1%
e56680
11.7%
t54194
11.2%
i34322
7.1%
n33353
6.9%
l30274
 
6.2%
p24153
 
5.0%
s7683
 
1.6%
Other values (15)45039
9.3%
ValueCountFrequency (%)
M26729
36.0%
A25257
34.0%
C2944
 
4.0%
K2899
 
3.9%
P2576
 
3.5%
S2576
 
3.5%
N2208
 
3.0%
D1472
 
2.0%
B1427
 
1.9%
F736
 
1.0%
Other values (11)5475
 
7.4%
ValueCountFrequency (%)
50146
100.0%
ValueCountFrequency (%)
-1472
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin558876
91.5%
Common51618
 
8.5%

Most frequent character per script

ValueCountFrequency (%)
a78760
14.1%
o61554
11.0%
r58565
10.5%
e56680
10.1%
t54194
9.7%
i34322
 
6.1%
n33353
 
6.0%
l30274
 
5.4%
M26729
 
4.8%
A25257
 
4.5%
Other values (36)99188
17.7%
ValueCountFrequency (%)
50146
97.1%
-1472
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII610494
100.0%

Most frequent character per block

ValueCountFrequency (%)
a78760
12.9%
o61554
10.1%
r58565
9.6%
e56680
9.3%
t54194
8.9%
50146
 
8.2%
i34322
 
5.6%
n33353
 
5.5%
l30274
 
5.0%
M26729
 
4.4%
Other values (38)125917
20.6%

iso_3166_2_code
Categorical

HIGH CARDINALITY
MISSING

Distinct2224
Distinct (%)0.3%
Missing3531512
Missing (%)82.1%
Memory size32.8 MiB
KE-10
 
368
BR-GO
 
368
PL-PK
 
368
VN-54
 
368
FR-49
 
368
Other values (2219)
767361 

Length

Max length6
Median length5
Mean length5.191745721
Min length4

Characters and Unicode

Total characters3993496
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowAE-AZ
2nd rowAE-AZ
3rd rowAE-AZ
4th rowAE-AZ
5th rowAE-AZ
ValueCountFrequency (%)
KE-10368
 
< 0.1%
BR-GO368
 
< 0.1%
PL-PK368
 
< 0.1%
VN-54368
 
< 0.1%
FR-49368
 
< 0.1%
EC-T368
 
< 0.1%
PE-LMA368
 
< 0.1%
RO-BN368
 
< 0.1%
ES-CC368
 
< 0.1%
PE-LAL368
 
< 0.1%
Other values (2214)765521
 
17.8%
(Missing)3531512
82.1%
2021-03-02T09:23:17.388590image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ar-f368
 
< 0.1%
fr-66368
 
< 0.1%
it-at368
 
< 0.1%
sk-bc368
 
< 0.1%
in-dn368
 
< 0.1%
it-62368
 
< 0.1%
jp-03368
 
< 0.1%
cz-ka368
 
< 0.1%
ca-ab368
 
< 0.1%
ch-sg368
 
< 0.1%
Other values (2214)765521
99.5%

Most occurring characters

ValueCountFrequency (%)
-769201
19.3%
B179646
 
4.5%
S167524
 
4.2%
R166616
 
4.2%
G163832
 
4.1%
T161735
 
4.0%
I156263
 
3.9%
N147609
 
3.7%
C146230
 
3.7%
E145001
 
3.6%
Other values (27)1789839
44.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2590292
64.9%
Dash Punctuation769201
 
19.3%
Decimal Number634003
 
15.9%

Most frequent character per category

ValueCountFrequency (%)
B179646
 
6.9%
S167524
 
6.5%
R166616
 
6.4%
G163832
 
6.3%
T161735
 
6.2%
I156263
 
6.0%
N147609
 
5.7%
C146230
 
5.6%
E145001
 
5.6%
A130373
 
5.0%
Other values (16)1025463
39.6%
ValueCountFrequency (%)
1110872
17.5%
0109522
17.3%
284690
13.4%
366125
10.4%
458790
9.3%
552134
8.2%
643778
 
6.9%
742616
 
6.7%
837363
 
5.9%
928113
 
4.4%
ValueCountFrequency (%)
-769201
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2590292
64.9%
Common1403204
35.1%

Most frequent character per script

ValueCountFrequency (%)
B179646
 
6.9%
S167524
 
6.5%
R166616
 
6.4%
G163832
 
6.3%
T161735
 
6.2%
I156263
 
6.0%
N147609
 
5.7%
C146230
 
5.6%
E145001
 
5.6%
A130373
 
5.0%
Other values (16)1025463
39.6%
ValueCountFrequency (%)
-769201
54.8%
1110872
 
7.9%
0109522
 
7.8%
284690
 
6.0%
366125
 
4.7%
458790
 
4.2%
552134
 
3.7%
643778
 
3.1%
742616
 
3.0%
837363
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII3993496
100.0%

Most frequent character per block

ValueCountFrequency (%)
-769201
19.3%
B179646
 
4.5%
S167524
 
4.2%
R166616
 
4.2%
G163832
 
4.1%
T161735
 
4.0%
I156263
 
3.9%
N147609
 
3.7%
C146230
 
3.7%
E145001
 
3.6%
Other values (27)1789839
44.8%

census_fips_code
Real number (ℝ≥0)

MISSING

Distinct2838
Distinct (%)0.3%
Missing3384928
Missing (%)78.7%
Infinite0
Infinite (%)0.0%
Mean30356.46021
Minimum1001
Maximum56045
Zeros0
Zeros (%)0.0%
Memory size32.8 MiB
2021-03-02T09:23:17.639922image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile5099
Q118105
median29115
Q345051
95-th percentile53075
Maximum56045
Range55044
Interquartile range (IQR)26946

Descriptive statistics

Standard deviation15299.02973
Coefficient of variation (CV)0.5039793711
Kurtosis-1.127910405
Mean30356.46021
Median Absolute Deviation (MAD)12012
Skewness-0.07283810347
Sum2.779999091 × 1010
Variance234060310.5
MonotocityNot monotonic
2021-03-02T09:23:17.824426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27171368
 
< 0.1%
37125368
 
< 0.1%
13223368
 
< 0.1%
42133368
 
< 0.1%
37151368
 
< 0.1%
26075368
 
< 0.1%
48375368
 
< 0.1%
6037368
 
< 0.1%
41029368
 
< 0.1%
29213368
 
< 0.1%
Other values (2828)912105
 
21.2%
(Missing)3384928
78.7%
ValueCountFrequency (%)
1001362
< 0.1%
1003368
< 0.1%
1005343
< 0.1%
1007343
< 0.1%
1009362
< 0.1%
ValueCountFrequency (%)
56045244
< 0.1%
56043251
< 0.1%
56041343
< 0.1%
56039343
< 0.1%
56037362
< 0.1%

place_id
Categorical

HIGH CARDINALITY

Distinct13276
Distinct (%)0.3%
Missing736
Missing (%)< 0.1%
Memory size32.8 MiB
ChIJA-Vp4NdDXDkRni-uywd-c1o
 
368
ChIJG-W5i09HU4YRiUwVqIdjhGw
 
368
ChIJCRlcH4ezhIARAFpUT3LUXyQ
 
368
ChIJlXSYuxF9pgARGX5qiWqW7nk
 
368
ChIJjVxms-FrAYcRwPvQAzP5JaQ
 
368
Other values (13271)
4298137 

Length

Max length27
Median length27
Mean length27
Min length27

Characters and Unicode

Total characters116099379
Distinct characters64
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowChIJvRKrsd9IXj4RpwoIwFYv0zM
2nd rowChIJvRKrsd9IXj4RpwoIwFYv0zM
3rd rowChIJvRKrsd9IXj4RpwoIwFYv0zM
4th rowChIJvRKrsd9IXj4RpwoIwFYv0zM
5th rowChIJvRKrsd9IXj4RpwoIwFYv0zM
ValueCountFrequency (%)
ChIJA-Vp4NdDXDkRni-uywd-c1o368
 
< 0.1%
ChIJG-W5i09HU4YRiUwVqIdjhGw368
 
< 0.1%
ChIJCRlcH4ezhIARAFpUT3LUXyQ368
 
< 0.1%
ChIJlXSYuxF9pgARGX5qiWqW7nk368
 
< 0.1%
ChIJjVxms-FrAYcRwPvQAzP5JaQ368
 
< 0.1%
ChIJu-SH28MJxkcRJYI2wf63IME368
 
< 0.1%
ChIJezSAEFMr1BIRq1kgW7rDxro368
 
< 0.1%
ChIJoyODwTLIV5QR5GZiwFe2FL8368
 
< 0.1%
ChIJE4nvwn3UkZURFAOLNknhvCQ368
 
< 0.1%
ChIJ7Rc_pRMuH0cRQMo4lLefAgM368
 
< 0.1%
Other values (13266)4296297
99.9%
(Missing)736
 
< 0.1%
2021-03-02T09:23:18.396892image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chijd9w2lenaiograyhzlrtppyy368
 
< 0.1%
chijmxk1pawyiw0r9rwem_sscfq368
 
< 0.1%
chijj614yjyawrqrvaumrvoqy1u368
 
< 0.1%
chij5q-ziccniocrbgkeukqw_ys368
 
< 0.1%
chijd8t--swfwxqrlpz1nfcdn5y368
 
< 0.1%
chij2q9mibf0qygrdf2if6v8nkc368
 
< 0.1%
chijb8o3hhcfddkrjj7wl7bboei368
 
< 0.1%
chijc1b7fc3s3ikru6ik0rybvt4368
 
< 0.1%
chijfsxg-ercpgar4mudwyceiee368
 
< 0.1%
chijl1dm9flnc0crykkuzg-vaam368
 
< 0.1%
Other values (13266)4296297
99.9%

Most occurring characters

ValueCountFrequency (%)
I6189468
 
5.3%
R5765578
 
5.0%
J5650376
 
4.9%
h5568893
 
4.8%
C5556987
 
4.8%
c2554210
 
2.2%
Q2459288
 
2.1%
A2340226
 
2.0%
g2262437
 
1.9%
Y2220616
 
1.9%
Other values (54)75531300
65.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter56770019
48.9%
Lowercase Letter41736825
35.9%
Decimal Number15023281
 
12.9%
Connector Punctuation1292450
 
1.1%
Dash Punctuation1276804
 
1.1%

Most frequent character per category

ValueCountFrequency (%)
I6189468
 
10.9%
R5765578
 
10.2%
J5650376
 
10.0%
C5556987
 
9.8%
Q2459288
 
4.3%
A2340226
 
4.1%
Y2220616
 
3.9%
U2170329
 
3.8%
E1899681
 
3.3%
M1866890
 
3.3%
Other values (16)20650580
36.4%
ValueCountFrequency (%)
h5568893
 
13.3%
c2554210
 
6.1%
g2262437
 
5.4%
k2035679
 
4.9%
o1814585
 
4.3%
w1771753
 
4.2%
s1616151
 
3.9%
x1554392
 
3.7%
z1411150
 
3.4%
a1345561
 
3.2%
Other values (16)19802014
47.4%
ValueCountFrequency (%)
42007435
13.4%
01896556
12.6%
81627515
10.8%
51514607
10.1%
11361479
9.1%
21360623
9.1%
31358992
9.0%
71337900
8.9%
61304676
8.7%
91253498
8.3%
ValueCountFrequency (%)
_1292450
100.0%
ValueCountFrequency (%)
-1276804
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin98506844
84.8%
Common17592535
 
15.2%

Most frequent character per script

ValueCountFrequency (%)
I6189468
 
6.3%
R5765578
 
5.9%
J5650376
 
5.7%
h5568893
 
5.7%
C5556987
 
5.6%
c2554210
 
2.6%
Q2459288
 
2.5%
A2340226
 
2.4%
g2262437
 
2.3%
Y2220616
 
2.3%
Other values (42)57938765
58.8%
ValueCountFrequency (%)
42007435
11.4%
01896556
10.8%
81627515
9.3%
51514607
8.6%
11361479
7.7%
21360623
7.7%
31358992
7.7%
71337900
7.6%
61304676
7.4%
_1292450
7.3%
Other values (2)2530302
14.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII116099379
100.0%

Most frequent character per block

ValueCountFrequency (%)
I6189468
 
5.3%
R5765578
 
5.0%
J5650376
 
4.9%
h5568893
 
4.8%
C5556987
 
4.8%
c2554210
 
2.2%
Q2459288
 
2.1%
A2340226
 
2.0%
g2262437
 
1.9%
Y2220616
 
1.9%
Other values (54)75531300
65.1%

date
Categorical

HIGH CARDINALITY

Distinct368
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.8 MiB
2020-11-25
 
13147
2020-11-27
 
13144
2020-11-11
 
13137
2020-11-24
 
13135
2020-07-09
 
13131
Other values (363)
4235019 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters43007130
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-02-15
2nd row2020-02-16
3rd row2020-02-17
4th row2020-02-18
5th row2020-02-19
ValueCountFrequency (%)
2020-11-2513147
 
0.3%
2020-11-2713144
 
0.3%
2020-11-1113137
 
0.3%
2020-11-2413135
 
0.3%
2020-07-0913131
 
0.3%
2020-12-2913130
 
0.3%
2020-11-1813128
 
0.3%
2021-01-2713128
 
0.3%
2020-06-2313128
 
0.3%
2020-12-3013127
 
0.3%
Other values (358)4169378
96.9%
2021-03-02T09:23:18.871623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-11-2513147
 
0.3%
2020-11-2713144
 
0.3%
2020-11-1113137
 
0.3%
2020-11-2413135
 
0.3%
2020-07-0913131
 
0.3%
2020-12-2913130
 
0.3%
2020-11-1813128
 
0.3%
2021-01-2713128
 
0.3%
2020-06-2313128
 
0.3%
2020-12-3013127
 
0.3%
Other values (358)4169378
96.9%

Most occurring characters

ValueCountFrequency (%)
013220183
30.7%
211170627
26.0%
-8601426
20.0%
14438722
 
10.3%
3978631
 
2.3%
7803024
 
1.9%
6797214
 
1.9%
5782610
 
1.8%
4770271
 
1.8%
9730801
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number34405704
80.0%
Dash Punctuation8601426
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
013220183
38.4%
211170627
32.5%
14438722
 
12.9%
3978631
 
2.8%
7803024
 
2.3%
6797214
 
2.3%
5782610
 
2.3%
4770271
 
2.2%
9730801
 
2.1%
8713621
 
2.1%
ValueCountFrequency (%)
-8601426
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common43007130
100.0%

Most frequent character per script

ValueCountFrequency (%)
013220183
30.7%
211170627
26.0%
-8601426
20.0%
14438722
 
10.3%
3978631
 
2.3%
7803024
 
1.9%
6797214
 
1.9%
5782610
 
1.8%
4770271
 
1.8%
9730801
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII43007130
100.0%

Most frequent character per block

ValueCountFrequency (%)
013220183
30.7%
211170627
26.0%
-8601426
20.0%
14438722
 
10.3%
3978631
 
2.3%
7803024
 
1.9%
6797214
 
1.9%
5782610
 
1.8%
4770271
 
1.8%
9730801
 
1.7%

retail_and_recreation
Real number (ℝ)

MISSING

Distinct425
Distinct (%)< 0.1%
Missing1598804
Missing (%)37.2%
Infinite0
Infinite (%)0.0%
Mean-23.61628537
Minimum-100
Maximum545
Zeros39043
Zeros (%)0.9%
Memory size32.8 MiB
2021-03-02T09:23:19.072122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-100
5-th percentile-75
Q1-41
median-20
Q3-5
95-th percentile14
Maximum545
Range645
Interquartile range (IQR)36

Descriptive statistics

Standard deviation27.64471363
Coefficient of variation (CV)-1.170578404
Kurtosis2.771897863
Mean-23.61628537
Median Absolute Deviation (MAD)17
Skewness-0.002345781175
Sum-63809054
Variance764.2301919
MonotocityNot monotonic
2021-03-02T09:23:19.267565image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1045531
 
1.1%
-1145362
 
1.1%
-1245321
 
1.1%
-845313
 
1.1%
-945087
 
1.0%
-1345021
 
1.0%
-1444930
 
1.0%
-744507
 
1.0%
-644343
 
1.0%
-1644178
 
1.0%
Other values (415)2252316
52.4%
(Missing)1598804
37.2%
ValueCountFrequency (%)
-100198
 
< 0.1%
-9983
 
< 0.1%
-98447
 
< 0.1%
-97948
< 0.1%
-961515
< 0.1%
ValueCountFrequency (%)
5451
< 0.1%
5331
< 0.1%
5272
< 0.1%
5121
< 0.1%
5071
< 0.1%

grocery_and_pharmacy
Real number (ℝ)

MISSING
ZEROS

Distinct537
Distinct (%)< 0.1%
Missing1694251
Missing (%)39.4%
Infinite0
Infinite (%)0.0%
Mean-3.029733409
Minimum-100
Maximum615
Zeros69501
Zeros (%)1.6%
Memory size32.8 MiB
2021-03-02T09:23:19.516932image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-100
5-th percentile-45
Q1-14
median-2
Q39
95-th percentile33
Maximum615
Range715
Interquartile range (IQR)23

Descriptive statistics

Standard deviation24.57067163
Coefficient of variation (CV)-8.109846087
Kurtosis11.80407405
Mean-3.029733409
Median Absolute Deviation (MAD)11
Skewness0.4747040403
Sum-7896885
Variance603.7179044
MonotocityNot monotonic
2021-03-02T09:23:19.743324image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
069501
 
1.6%
-168734
 
1.6%
-268528
 
1.6%
167801
 
1.6%
266858
 
1.6%
-366391
 
1.5%
-464922
 
1.5%
364573
 
1.5%
-563172
 
1.5%
461890
 
1.4%
Other values (527)1944092
45.2%
(Missing)1694251
39.4%
ValueCountFrequency (%)
-100152
 
< 0.1%
-991
 
< 0.1%
-9845
 
< 0.1%
-97250
 
< 0.1%
-96654
< 0.1%
ValueCountFrequency (%)
6151
< 0.1%
5381
< 0.1%
5331
< 0.1%
5271
< 0.1%
5241
< 0.1%

parks
Real number (ℝ)

MISSING

Distinct855
Distinct (%)< 0.1%
Missing2235583
Missing (%)52.0%
Infinite0
Infinite (%)0.0%
Mean-10.37651237
Minimum-100
Maximum1206
Zeros19161
Zeros (%)0.4%
Memory size32.8 MiB
2021-03-02T09:23:20.110344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-100
5-th percentile-78
Q1-43
median-18
Q310
95-th percentile83
Maximum1206
Range1306
Interquartile range (IQR)53

Descriptive statistics

Standard deviation53.79536302
Coefficient of variation (CV)-5.184339504
Kurtosis14.7527523
Mean-10.37651237
Median Absolute Deviation (MAD)26
Skewness2.400233603
Sum-21428847
Variance2893.941082
MonotocityNot monotonic
2021-03-02T09:23:20.377628image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2721323
 
0.5%
-2521281
 
0.5%
-2121199
 
0.5%
-2221140
 
0.5%
-2421140
 
0.5%
-2621125
 
0.5%
-2921111
 
0.5%
-2821089
 
0.5%
-2021012
 
0.5%
-1821001
 
0.5%
Other values (845)1853709
43.1%
(Missing)2235583
52.0%
ValueCountFrequency (%)
-1003634
0.1%
-99288
 
< 0.1%
-98836
 
< 0.1%
-971183
 
< 0.1%
-961540
< 0.1%
ValueCountFrequency (%)
12061
< 0.1%
11871
< 0.1%
11461
< 0.1%
11361
< 0.1%
11131
< 0.1%

transit_stations
Real number (ℝ)

MISSING

Distinct540
Distinct (%)< 0.1%
Missing2127730
Missing (%)49.5%
Infinite0
Infinite (%)0.0%
Mean-27.43854002
Minimum-100
Maximum554
Zeros22407
Zeros (%)0.5%
Memory size32.8 MiB
2021-03-02T09:23:20.641888image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-100
5-th percentile-73
Q1-48
median-28
Q3-8
95-th percentile18
Maximum554
Range654
Interquartile range (IQR)40

Descriptive statistics

Standard deviation29.92665808
Coefficient of variation (CV)-1.090679681
Kurtosis7.340301756
Mean-27.43854002
Median Absolute Deviation (MAD)20
Skewness0.9435085961
Sum-59623481
Variance895.6048636
MonotocityNot monotonic
2021-03-02T09:23:20.852358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-3528940
 
0.7%
-2928919
 
0.7%
-3328859
 
0.7%
-3228790
 
0.7%
-3028761
 
0.7%
-2828734
 
0.7%
-3428575
 
0.7%
-3628575
 
0.7%
-3128495
 
0.7%
-2528338
 
0.7%
Other values (530)1885997
43.9%
(Missing)2127730
49.5%
ValueCountFrequency (%)
-1001778
< 0.1%
-997
 
< 0.1%
-9874
 
< 0.1%
-97254
 
< 0.1%
-96526
 
< 0.1%
ValueCountFrequency (%)
5541
< 0.1%
5241
< 0.1%
5201
< 0.1%
5061
< 0.1%
5051
< 0.1%

workplaces
Real number (ℝ)

MISSING
ZEROS

Distinct325
Distinct (%)< 0.1%
Missing198009
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean-20.03532012
Minimum-100
Maximum260
Zeros67235
Zeros (%)1.6%
Memory size32.8 MiB
2021-03-02T09:23:21.128588image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-100
5-th percentile-57
Q1-32
median-19
Q3-6
95-th percentile9
Maximum260
Range360
Interquartile range (IQR)26

Descriptive statistics

Standard deviation20.13046196
Coefficient of variation (CV)-1.004748706
Kurtosis0.9715821669
Mean-20.03532012
Median Absolute Deviation (MAD)13
Skewness-0.3969940416
Sum-82198988
Variance405.2354986
MonotocityNot monotonic
2021-03-02T09:23:21.360967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2285427
 
2.0%
-2185364
 
2.0%
-2085349
 
2.0%
-2384869
 
2.0%
-1984833
 
2.0%
-2484831
 
2.0%
-1884654
 
2.0%
-1784019
 
2.0%
-1683089
 
1.9%
-2582891
 
1.9%
Other values (315)3257378
75.7%
(Missing)198009
 
4.6%
ValueCountFrequency (%)
-10028
< 0.1%
-998
 
< 0.1%
-976
 
< 0.1%
-962
 
< 0.1%
-955
 
< 0.1%
ValueCountFrequency (%)
2601
< 0.1%
2581
< 0.1%
2571
< 0.1%
2541
< 0.1%
2481
< 0.1%

residential
Real number (ℝ)

MISSING
ZEROS

Distinct107
Distinct (%)< 0.1%
Missing1798617
Missing (%)41.8%
Infinite0
Infinite (%)0.0%
Mean9.331561219
Minimum-46
Maximum65
Zeros91313
Zeros (%)2.1%
Memory size32.8 MiB
2021-03-02T09:23:21.663193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-46
5-th percentile-1
Q14
median8
Q314
95-th percentile24
Maximum65
Range111
Interquartile range (IQR)10

Descriptive statistics

Standard deviation7.789884981
Coefficient of variation (CV)0.8347890346
Kurtosis1.163721839
Mean9.331561219
Median Absolute Deviation (MAD)5
Skewness0.7758349242
Sum23348462
Variance60.68230802
MonotocityNot monotonic
2021-03-02T09:23:21.961393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7148902
 
3.5%
6146996
 
3.4%
8146372
 
3.4%
5140094
 
3.3%
9138896
 
3.2%
10130285
 
3.0%
4125221
 
2.9%
11119543
 
2.8%
3112310
 
2.6%
12108413
 
2.5%
Other values (97)1185064
27.6%
(Missing)1798617
41.8%
ValueCountFrequency (%)
-464
< 0.1%
-452
< 0.1%
-431
 
< 0.1%
-401
 
< 0.1%
-391
 
< 0.1%
ValueCountFrequency (%)
651
< 0.1%
642
< 0.1%
631
< 0.1%
621
< 0.1%
611
< 0.1%

Interactions

2021-03-02T09:21:28.752445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:30.118618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:30.823390image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:31.488500image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:32.485082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:33.339862image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:34.298808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:35.962054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:37.082614image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:38.342357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:39.728618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:40.981299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:41.525842image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:43.329985image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:44.440050image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:45.710972image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:47.895131image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:49.751591image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:50.441778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:51.526877image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:52.652840image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:53.784804image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:55.397900image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:56.811120image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:57.398607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:21:58.746968image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:00.006599image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:00.989970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:02.315424image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:03.777514image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:04.774848image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:06.497239image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:08.128875image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:09.639835image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:11.238560image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:12.688679image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:13.237212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:14.477895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:15.852218image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:16.887482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:18.105193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-02T09:22:20.047997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-03-02T09:23:22.268540image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-02T09:23:22.851978image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-02T09:23:23.498285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-02T09:23:24.136543image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-03-02T09:22:28.861328image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-02T09:22:39.772203image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-02T09:23:02.647023image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-03-02T09:23:06.981535image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

country_region_codecountry_regionsub_region_1sub_region_2metro_areaiso_3166_2_codecensus_fips_codeplace_iddateretail_and_recreationgrocery_and_pharmacyparkstransit_stationsworkplacesresidential
0AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-150.04.05.00.02.01.0
1AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-161.04.04.01.02.01.0
2AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-17-1.01.05.01.02.01.0
3AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-18-2.01.05.00.02.01.0
4AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-19-2.00.04.0-1.02.01.0
5AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-20-2.01.06.01.01.01.0
6AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-21-3.02.06.00.0-1.01.0
7AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-22-2.02.04.0-2.03.01.0
8AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-23-1.03.03.0-1.04.01.0
9AEUnited Arab EmiratesNaNNaNNaNNaNNaNChIJvRKrsd9IXj4RpwoIwFYv0zM2020-02-24-3.00.05.0-1.03.01.0

Last rows

country_region_codecountry_regionsub_region_1sub_region_2metro_areaiso_3166_2_codecensus_fips_codeplace_iddateretail_and_recreationgrocery_and_pharmacyparkstransit_stationsworkplacesresidential
4300703ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-03NaNNaNNaNNaN-29.0NaN
4300704ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-04NaNNaNNaNNaN-31.0NaN
4300705ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-05NaNNaNNaNNaN-27.0NaN
4300706ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-08NaNNaNNaNNaN-30.0NaN
4300707ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-09NaNNaNNaNNaN-19.0NaN
4300708ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-10NaNNaNNaNNaN-25.0NaN
4300709ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-11NaNNaNNaNNaN-28.0NaN
4300710ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-12NaNNaNNaNNaN-25.0NaN
4300711ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-15NaNNaNNaNNaN-31.0NaN
4300712ZWZimbabweMidlands ProvinceKwekweNaNNaNNaNChIJRcIZ3-FJNBkRRsj55IcLpfU2021-02-16NaNNaNNaNNaN-24.0NaN